21/3, K/l (Item 1 from file: 275) 

EUALOG(R) File 275: Gale Group Computer DB(TM) 
[-:) 2004 The Gale Group. All rts. reserv. 

02523341 SUPPLIER NUMBER: 76734581 (USE FORMAT 7 OR 9 FOR FULL TEXT) 

Itanium Arrives, Vendors Plan Big — HP, IBM and Unisys are among the first 
to build around the first generation of IA-64 chips . (Product Information) 

Bucholtz, Chris 
VARbusiness, 47 
July 23, 2001 

ISSN: 0894-5802 LANGUAGE: English RECORD TYPE: Fulltext 

WORD COUNT: 1058 LINE COUNT: 00083 

. . . will depend on how quickly software developers embrace 64-bit 

computing and create applications tuned to take advantage of the improved 
performance Itanium promises. 

Intel predicts software makers will announce between 20 to 60 
applications in the first few months after Itanium's arrival as a 
commercial product. Before the end... 

...there should be some 400 Itanium-optimized applications. 
Itanium's Family Tree 

Itanium is a family of processors that will be built around the 
Explicitly Parallel Instruction Computing (EPIC) technology. Intel and 
Hewlett-Packard have announced plans for four generations of the CPU, with 
more to come in the future. 

- MERCED. . . 

...samples). Starts at an 800-MHz clock speed with a 128-bit bus, providing 
a peak bandwidth of 3.2 Gbps . Includes a three-level cache , adding a 
Level 0 cache . In addition, 2 MB to 4 MB of Level 2 cache is 

included . 

- MCKINLEY To arrive in the second half of 2001. McKinley's clock 
speed will likely start at over 1 GHz, roughly doubling Merced. . . 
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...notebook computer co one-inch thickness and 4 pounds in weight. The 
Portege 7000CT comes standard with 266-MHz Mobile Pentium II processor, 32M 
of synchronous dynamic RAM, 512K of Level 2 cache , 4 . 3G hard drive, 
56-Kbps modem, 12.1-inch active-matrix display and Microsoft Windows 95 or 
Windows 98. An optional expansion dock has a built-in floppy drive, 
::;:.eqraued 10 /100-Mbps Ethernet adapter and 24X CD-ROM drive. Toshiba 
:.:,ins up co 5,800 possible configurations for its new Tecra 8000 notebook. 
N; art. i ng wi th . . . 

...configuration CD-ROM to show buyers all the choices. A third notebook, 
che Satellite 4000, has a 233-MHz Mobile Pentium II processor, 512K of 
Level 2 cache , 4 . 1G hard drive, 56-Kbps modem and 24X CD-ROM drive. 
Buyers have a choice of active matrix or dual-scan 12.1-inch displays. 
Toshiba has predicted about 2 1/2 hours of battery operation for the 
Portege 7000CT. Both the Portege 7000CT and the Tecra 8000 will accept 
Intel Corp. ' s . . . 
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... Inc., Silicon Graphics Inc. and other workstation makers. 

in the lab's tests, the 400-MHz Xeon with 512K of 
wrror-correcting-code Level 2 cache edged an average 400-MHz Pentium II 
with 512K of synchronous dynamic RAM by about 3 1/2 percent in math 
benchmarks. In heavy-cache... 

...The people represent the processes the computer performs. 

Every amusement park has a featured ride, usually the roller coaster. 
Xeon represents this speed demon. L2 cache is the long line in which 
visitors stand, like processes waiting to get into the Xeon. 

Level 1 cache , which is far smaller than L2 , amounts to 32K in a 
Pentium II. It's analogous to the front of the queue, where the visitors-or 
processes-separate to be seated on the roller coaster. At each tick of the 
clock, the processes hop on board to take their ride through the processor. 

L2 cache used to operate at the same speed as the motherboard. For 
example, in a standard Pentium II with the 100-MHz BX chip, the L2 cache 

ran at 100 MHz. But the processor was whisking processes from the LI 
cache at 400 MHz, while the L2 cache line was advancing at only 
one-quarter of that speed. A slight bottleneck developed. 

The Xeon's L2 cache on the GX chip operates at 400 MHz, the same 
speed as the processor and LI cache . So all the processes are in 
lockstep, and the line for the roller coaster moves four times as fast. 

The Xeon's performance gains look... 
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... point registers. The M2 has a latency of two clock cycles but is 

cully pipelined, allowing it to complete an operation every clock cycle 
when cache hits are successful. 

Floating-point performance was a major shortcoming of the 6x86 
relative to the Pentium. With the M2, Cyrix is making improvements, 
including giving the FPU a dedicated bus to the cache to avoid the FPU 
stalls that occurred with the 6x86 when the bus interface and FPU accessed 
cache simultaneously . 

Cyrix also has enhanced its TLB (translation lookaside buffer) design 
by departing from the single 128-entry TLB on its 6x86 to a two - level 
TLB on the M2 . The M2 ' s LI TLB is small at 16-entry but is direct-mapped. 
The small TLB helps improve performance by eliminating a pipeline stage for 
dvJ'iress translation. Backing up the LI TLB is a six-way set associative, 
••tt 4 -enu ry TLB. Branch prediction is also improved by increasing the BTB 
'branch target buffer) from the 6x86's 256-entry BTB to a 512-entry BTB. 



Intel also has. 
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... cache instead of sharing the data path with the bus interface unit. 

In the 6x86, the FPU stalled if the bus interface was accessing the cache 
when the FPU request occurred; in the M2, both can access it concurrently. 
The other change in the memory system is inside the 

memory-management unit. Cyrix's designers expected that the 128-entry 
T L B in the 6x86 would become a clock-speed limiter at higher frequencies. 
At the same time , traces of applications showed that TLB miss rates 
.increased with 32-bit applications, making an even better hit rate 

impor tan t . 

To achieve this goal, the M2 uses a two - level TLB. The relatively 
small, 16-entry level - one TLB is direct mapped to support high clock 
rai.es. It is backed by a 384 -entry, six-way set-associative level - two 
TLB. The cast level - one TLB (with an estimated 92% hit rate) avoids the 
r.eed ror another pipeline stage for address translation during most 
accesses. When misses in the level - one TLB occur, an estimated 99.6% of 

sses will hit in the level - two TLB and incur only a one-cycle 
rcr.'iL-y. Both TLBs are dual-ported to support both program and data address 
■ t a :', s I h z ion s concurrently . 
!-'pw Core Enhancements 

!'o reduce the number of stalls due to mispredicted branches, the 
number of entries in the branch target cache and branch history table 
were doubled; each now has 512 entries. The organization and algorithms 
remain unchanged . 

The pipeline is nearly identical to that in the 6x86, with... 
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... che pretzel representing the "0" or off value. Throw the switch, 

and the electrons romp through the other loop for the "1" or on value. 

Cache memory serves as a "holding pool" for memory and instructions 
which are in frequent use by the processor. Programming code (in it's most 
primitive, machine code manifestation) is transferred from regular memory 
into uhe cache memory. That allows the Pentium chip to access, and hence 
pxecute, the instructions very quickly. Often, many motherboards rely on 
what's called a " level 1 " cache , which is a small amount of memory 
built into the Pentium chip. (There are actually two 8K caches , one for 
code and one for data.) High-performance motherboards utilize a " level 2 
cache , which is an external memory source like we have here. Generally, 
the larger the level 2 cache , the better, but it's hard to predict 



what. ':he absolute increase in speed will be, since it depends on the mix of 
i:,r.:cl. ine level instructions encountered in the current program. 

This motherboard sports 256K of speedy NEC 8-nanosecond (that's 
eight-millionths of a second access time, as opposed to 70 nanoseconds for 
your main RAM) synchronous SRAM. The cache memory is synchronous in 
that the 256K of cache memory runs at the same speed as the Pentium's 
internal caches, offering up heinous performance. 

The square chips... 
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... doubling the bandwidth to memory without increasing the pin count. 

The single bus also allows main-memory data to be stored directly into the 
L2 cache as it is bypassed into the CPU. The downside is the small cost 
of the external switches and the need to implement a 128-bit-wide DRAM 
ai ray . 

If there is no other memory activity in progress, the CPU sends an 
address to the main memory at the same time that it is broadcast to the 

L2 cache (if present), minimizing DRAM latency. If the access misses 
the L2 cache , the DRAM ' s row-access (RAS) time is already past ; the 
DRAM access is then completed by asserting CAS. 

With 60-ns fast-page-mode DRAM, the CPU stalls for only 14 cycles on 
an LI cache miss (16 if there is an L2 cache , because RAS must be 
delayed until the L2 miss is detected) . Because of the nonblocking cache 
, some of this latency can be hidden if the requested data is not 
immediately required by another instruction. With the 128-bit bus, it takes 
only two accesses to fill an LI cache line, for a total of 22 (or 24) 
cycles . 

The memory controller is quite flexible in its timing, allowing it to 
support a variety of... 
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the Nx586 processor with FPU offers superior performance and 

value . " 

The floating point unit is integrated into the Nx586's RISC86 micro 
architecture allowing full parallel operation of floating point 
operations. The floating point versions of the MPU are packaged as 
multi-chip modules (MCMs) utilizing IBM's flip-chip packaging... 

...to the x86 instruction set, as well as incorporating the computer 

■i : "hiceccure techniques of out-of-order executive, speculative execution, 

: ;:s;.or tenanting, data forwarding and two - level branch prediction . 



The new processors also continue to incorporate 32 kilobytes of level 
one cache and an on-chip level - 2 cache controller with a private 
64 -bit level - 2 cache bus. 

The Nx586-Pfl00 processor is sampling now with production availability 
slared for December, priced at $285 each in quantities of 1,000. The Nx586 
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...ABSTRACT: for multimedia authoring, full-motion video and advanced 3-D 
graphics. It owes its performance to the Dynamic Execution architecture, 
which makes it both highly parallel and compatible with previous 
generations of x86 processors. Dynamic Execution combines 'speculative' 
out-of-order execution, register renaming, data-flow analysis and multiple 
branch prediction techniques to eliminate idle time. A detailed technical 
description of how these techniques work is presented. The new chip also 
uses a built-in, 'non-blocking* Level 2 cache in addition to the 
Level 1 cache found in the 486 and Pentium. Programmers should employ 
such strategies as partial stall removal, data alignment and improved 
branch prediction to optimize their code for the P6. A P6 compiler must 
support all of these techniques to be effective. 



21/3, K/10 (Item 10 from file: 275) 

' - ! AL.ON ' R; H le 275: Gale Group Computer DB(TM) 
' £.')\)A The Gale Group. All rts. reserv. 

01615896 SUPPLIER NUMBER: 14358438 (USE FORMAT 7 OR 9 FOR FULL TEXT) 

TFP designed for tremendous floating point. (MIPS Technologies Inc. 's TFP 

microprocessor) (includes related article on pricing and availability) 

(Product Announcement) 

Gwennap, Linley 

Microprocessor Report, v7, nil, p9(5) 
August 23, 1993 

DOCUMENT TYPE: Product Announcement ISSN: 0899-9341 LANGUAGE: 

ENGLISH RECORD TYPE: FULLTEXT 

WORD COUNT: 3781 LINE COUNT: 00283 

. . . pipeline but rearranges the stages to reduce the load-use penalty. 

As this reordering increases the mispredicted-branch penalty, the chip uses 
a large branch- prediction cache to reduce the number of mispredicted 
branches . 

The off-chip cache acts as a large first - level cache for 
floating-point data as well as a second - level cache for instructions 
and integer data. The tag and data SRAMs are interleaved, allowing the 
processor co reach its bandwidth goal. The cache requires custom tag RAMs 
one: uses synchronous SRAMs for the data. Because of the extra expense of 
• r.v> iwo-chip processor and high-performance cache , TFP will not replace 
•;< HI'lf; the high end of the MIPS family; instead, it will satisfy 
' ::opd for maximum floating-point. . . 
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RAM; LocalTalk, parallel and serial ports (with automatic polling 
that, as with the QMS network printers, allows it to accept input from all 
three interfaces concurrently ) / a 250-sheet input tray; a 250-sheet 
correct-order output tray and a 30-sheet face-up output tray. Optional 
250-sheet cassette and. . . 

...provides automatic halftoning of CMYK color values, thereby eliminating 
the need to emulate color in the application software. (For a more complete 
discussion of Level 2 and its benefits, see Vol. 5, No. 7.) 

Adobe has consistently argued that Level 2 enhancements don't 
render Level 1 machines obsolete--indeed, Adobe has hinted that it 
intends to provide new drivers that will bring some of the benefits of 
Level 2 machines to Level 1 printers. 

We don't buy this argument. We believe that Level 2 will be 
viewed by the market as the "next rev" of PostScript, which by definition 
makes Level 1 obsolete just as the release of PageMaker 4.0 rendered 
PageMaker 3.0 history. Our view received a boost by an ad that ran in... 

...That Fins (one of Dataproducts' OEM customers) loudly proclaimed the 

--.i rival of its PostScript 2 printer and, after enumerating the real and 

hyped benefits of Level 2 , declared all Level 1 products to be 
obsolete, useful only as "boat anchors." We rest our case. 
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scage for address translation during most accesses. When misses in 
•no level-one TLB occur, an estimated 99.6% of accesses will hit in the 
level - two TLB and incur only a one-cycle penalty. Both TLBs are 
::j«l-ported to support both program and data address translations 
concurrently . 

Few Core Enhancements 

To reduce the number of stalls due to mispredicted branches, the 
number: of entries in the branch target cache and branch history table 
were doubled; each now has 512 entries. The organization and algorithms 
; »i:»a i n unchanged . 

The pipeline is nearly identical to that in the 6x86, with. . . 
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. . . VS SBC/PKx allows high performance realtime applications to exploit 

the capabilities of the K6 . The K6 boasts integrated MMX technology 
capability, enlarged 64Kbyte local ( LI ) cache , out-of-order execution, 
dual pipelines, branch prediction and other advanced technologies all of 
which enhance the board's performance. In addition, the VS SBC/PKx offers 
users a number of interfaces with... 

. . ..'jp* rating Systems such as VxWorks, Windows 95 and Windows NT are 

'*r;r Kfi CPU is supported by 5.12Kbytes of very fast, 64bit wide, 
•■-.i. v ( L2 ) cache using synchronous burst SRAMs. Up to 128Mbytes 
("i A d .i. l wide EDO-DRAM can be installed on the board using industry 

^•■.andara simms. Up to 1Mbyte of Flash... 

. . . bios . 

An Ultra-SCSI, SCSI-2 or SCSI-1 interface is provided using the 
Symbios Logic 33MHz 53C860 SCSI processor. 

The interface supports single-ended synchronous or synchronous 
transfers with active termination and signal negation - data rates up to 
20Mbytes/sec are supported. The interface is accessed via the P2 DIN 
ron nee tor . 

The. . . 
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... workstation will serve you well. 

Technology analyst Andre Kvitka {andre kvitka@infoworld.com) has been 
r\ hardware junkie at the InfoWorld Test Center for the past eight years. 

Kayak system configuration 

* Two 300-MHz Intel Pentium II CPUs 

* 32KB of Level 1 cache per CPU 

* 512KB of Level 2 cache per CPU 

+ 64MB of synchronous DRAM (expandable to 512MB) 

* Adaptec AIC-7880 Wide Ultra SCSI interface 

* Adaptec AIC-7860 Narrow Ultra SCSI interface 

* Adaptec ARO-1130 SCSI accelerator, RAID controller with 16MB of 

cache 

+ Two 4.55GB Seagate Cheetah 4 LP ST-34501W 10,000-rpm SCSI drives 

* RAID Level 0 

* 24x IDE CD-ROM drive 

* 1.44MB, 3.5-inch floppy drive 

* Matrox MGA Millenium II with Accelerated Graphics Port (AGP) 
graphics accelerator with 8MB. . . 
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... point registers. The M2 has a latency of two clock cycles but is 

fully pipelined, allowing it to complete an operation every clock cycle 
when cache hits are successful. 

Floating-point performance was a major shortcoming of the 6x86 
relative to the Pentium. With the M2, Cyrix is making improvements, 
including giving the FPU a dedicated bus to the cache to avoid the FPU 
stalls that occurred with the 6x86 when the bus interface and FPU accessed 
cache simultaneously . 

Cyrix also has enhanced its TLB (translation lookaside buffer) design 
by departing from the single 128-entry TLB on its 6x86 to a two - level 
TLB on the M2 . The M2 ' s LI TLB is small at 16-entry but is direct -mapped . 
The small TLB helps improve performance by eliminating a pipeline stage for 
address translation. Backing up the LI TLB is a six-way set associative, 
334 -entry TLB. Branch prediction is also improved by increasing the BTB 
'branch target buffer) from the 6x86's 256-entry BTB to a 512-entry BTB. 

Intel also has . . . 
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. . . Integrated Device Technology also has several RC5000-series devices 

in its family. Like the NEC VR5000, the chips include dual 32-kbyte 
instruction and data caches , a dual-issue floating-point ALU, and a 
five-stage pipeline. The peak clock rates and thus, MIPS throughput, 
though, are higher than the NEC... 

...internal top clock frequencies of 250 MHz, the Toshiba TC86R4400 
delivers a raw throughput of between 300 and 350 MIPS. Direct-mapped 
instruction and data caches of 16 kbytes each tie into the external L2 
cache over a 128-bit-wide bus, which supports fast line refills. An 
on-chip memory-management unit employs a fully associative 
t ransla tion-look-aside buffer... 

...unit, designers at IBM and Motorola have developed the MPC750/740 
processors. These CPUs can issue two or more instructions every cycle. They 
pack large LI instruction and data caches --32 kbytes each--and are 
eight-way set associative. When clocked at 266 MHz, the design can deliver 
a throughput of well over 300 MIPS. To get the high throughput, four 
instructions are simultaneously fetched from the instruction cache. 
Meanwhile, dual instructions are fetched from the branch-target instruction 
cache. The processor performs speculative execution with dynamic 
prediction . To improve the response to branches, they are processed 
upstream of the dispatch unit. 

Prior to the 750/740, IBM and Motorola did enhance the... 
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... kviLka@infoworld.com) has been a hardware junkie at the InfoWorld 

Tf-sr. Center for the past eight years. 

IBM system configuration 

- Two' 300-MHz Intel Pentium II CPUs 

* 32KB of Level 1 cache per CPU 

* 512KB of Level 2 cache per CPU 

; 64MB of synchronous DRAM, expandable to 512MB 

* Dual-channel Wide Ultra SCSI controller 

* RAIDport option for RAID Level 0 , 1 or 5 

: IBM Scorpion 4.5GB 7200-rpm Wide SCSI hard drive 

1 Sony 24-speed IDE CD-ROM drive 

+ Permedia P2 video card with 8MB. . . 

. . . NetMeeting, IBM VoiceType Simple Speaking Gold and LANClient 
Configuration Manager, Lotus SmartSuite, NetFinity 
Compaq system configuration 

* Two 300-MHz Intel Pentium II CPUs 

* 32KB of Level 1 cache per CPU 

* 512KB of Level 2 cache per CPU 

* 64MB EDO error-correcting code buffered DIMMs (expandable to 512MB) 

* Wide Ultra SCSI host controller 

* 4.3GB Wide Ultra SCSI 7200-rpm SMART... 
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... Intel's P6 marketing director, said the 32-bit dataflow engine has 

the latest architectural features: out-of-order and speculative execution 
Hp.d improved branch prediction . 

Feeding this microengine is an 8-kbyte two-way instruction and 8-kbyte 
four-way set associative LI cache . In addition, a 256-kbyte L2 cache 

is mounted on the same ceramic substrate as the P6 CPU, and both are 
contained in the same package. Intel says combining the L2 cache with 
the CPU ensures the highest performance and fastest time to market. 

The L2 cache is on its own dedicated bus to the P6, thus allowing 
a fast 1-Gbyte per second transfer rate. Main memory and PCI (peripheral 
component . . . 

...0 are on a separate 500-Mbyte/s 64-bit data and 36-bit address 
f ransaction-based bus. Intel claims the bus can accommodate 200 concurrent 
'i-Mb.i.t/s videostreams without affecting CPU performance. One other feature 
•he ?6 is that the CPU , L2 cache and memory bus are all non-blocking. 
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. . . will depend on how quickly software developers embrace 64-bit 

computing and create applications tuned to take advantage of the improved 
performance Itanium promises. 

Intel predicts software makers will announce between 20 to 60 
applications in the first few months after Itanium's arrival as a 
commercial product. Before the end... 

...there should be some 400 Itanium-optimized applications. 



Itanium's Family Tree 

Itanium is a family of processors that will be built around the 
Explicitly Parallel Instruction Computing (EPIC) technology. Intel and 
Hewlett-Packard have announced plans for four generations of the CPU, with 

more to come in the future. 

- MFRCED. . . 

...samples). Starts at an 800- MHz clock speed with a 128-bit bus, 
providing a peak bandwidth of 3.2 Gbps . Includes a three-level cache , 
adding a Level 0 cache . In addition, 2 MB to 4 MB of Level 2 
cache is included. 

- MCKINLEY To arrive in the second half of 2001. McKinley's clock 
speed will likely start at over 1 GHz, roughly doubling Merced. . . 
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... and Pentium Pro CPUs, and then some. Like the former, it has the 

ion sec for accelerating MMX software, as well as 32 KB of LI { 
Level 1 ) cache memory, up from 16 KB in earlier chips; cache memory 

i'!s ahead" to the next lines of program code, speeding up MMX and 
•. r.-MMX applications alike. 

But like the Pentium Pro chip, the PII separates the bus on which 
the L2 cache memory resides ( L2 cache is where the processor turns 
after the LI cache for the next sequential bits of program code) from 
the system bus, where main memory, or RAM, resides. Thus, the data in the 
L2 cache and the larger chunks of data in main memory can travel 
simultaneously on separate buses, a feature that contributes to greater 
overall efficiency. Also, the PI I doesn't have to finish processing one 
piece of data before it can start on another, and it has an improved 
ability to predict which of its branches will be working on program 
instructions, to further smooth data flow. 



